Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590

sfc-gh-tszerszen · 2022-10-23T09:52:46Z

📚 Context

This PR improves Snowpark integration in Streamlit, it introduces changes like:

adds support for uncollected Snowpark dataframes and tables in st.map component
adds support for Snowpark table auto-collection (snowflake.snowpark.table.Table) in Streamlit components, before only unevaluated Dataframes were collected
Limits auto-collection for st.table to 100 rows, instead of 10k rows. When 10k in st.table was collected it was buggy, page took long to load, app sometimes crashed and the page was very very long (it was discussed with @jrieke)

What kind of change does this PR introduce?
- Bugfix
- Feature
- Refactoring
- Other, please describe:

🧠 Description of Changes

Add bullet points summarizing your changes here
- This is a breaking API change
- This is a visible (user-facing) change

Revised:

Insert screenshot of your updated UI/code here

Current:

Insert screenshot of existing UI/code here

🧪 Testing Done

Screenshots included
Added/Updated unit tests
Added/Updated e2e tests

🌐 References

Does this depend on other work, documents, or tickets?

Issue: Closes #XXXX

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

…p and improve st.table data collection

LukasMasuch · 2022-10-25T10:18:54Z

lib/streamlit/elements/map.py

        return json.dumps(_DEFAULT_MAP)

+    if hasattr(data, "empty"):


I think this can be simplified a bit:

if hasattr(data, "empty") and data.empty: return json.dumps(_DEFAULT_MAP)

And I think we can remove the ignore and the comment from harahu.

Ok, so version with and requires # type: ignore, which makes @harahu comment completely up to date and valid I think 🤔

Also you meant and right? Not & operator?

I've updated the PR with valid example

Ahh yes, meant and. Ok, in that case we can leave the comment. I that hasattr(data, "empty") would give enough hints to the typing tool for this 🤔

LukasMasuch

Overall looks good 👍 I added a few comments

LukasMasuch · 2022-10-25T10:19:42Z

lib/streamlit/elements/map.py

@@ -186,7 +194,7 @@ def to_deckgl_json(data: Data, zoom: Optional[int]) -> str:
    #  the empty attribute. This is either a bug, or the documented data type
    #  is too broad. One or the other should be addressed, and the ignore
    #  statement removed.
-    if data[lon].isnull().values.any() or data[lat].isnull().values.any():  # type: ignore[index]
+    if data[lon].isnull().values.any() or data[lat].isnull().values.any():


nit: If we can remove the ignore statement here, we can also remove the comment from harahu.

It's removed 👍

LukasMasuch · 2022-10-25T10:24:58Z

lib/streamlit/type_util.py

-        if df.shape[0] == MAX_UNEVALUATED_DF_ROWS:
+    if (
+        is_type(df, _SNOWPARK_DF_TYPE_STR)
+        and not isinstance(df, list)


nit: I think the and not isinstance(df, list) is not really necessary here, since if it is of type _SNOWPARK_DF_TYPE_STR or _SNOWPARK_TABLE_TYPE_STR it anyways cannot be a list. An alternative might be something like:

is_snowpark_dataframe(df) and not isinstance(df, list)

It's updated now 👍

LukasMasuch · 2022-10-25T10:28:24Z

lib/streamlit/type_util.py

            st.caption(
-                f"⚠️ Showing only 10k rows. Call `collect()` on the dataframe to show more."
+                f"⚠️ Showing only {'10k' if max_unevaluated_rows == MAX_UNEVALUATED_DF_ROWS else str(max_unevaluated_rows)} rows. "


nit: at this point, it might be worth adding a function to string_util that automatically does this, e.g.:

def simplify_number(num: int) -> str: num_converted = float("{:.2g}".format(num)) magnitude = 0 while abs(num_converted) >= 1000: magnitude += 1 num_converted /= 1000.0 return "{}{}".format( "{:f}".format(num_converted).rstrip("0").rstrip("."), ["", "k", "m", "b", "t"][magnitude], )

Your func is created in string_util and units tests are added 👍 Thank you 🙇

LukasMasuch · 2022-10-25T10:34:51Z

lib/streamlit/elements/map.py

+        if data.empty:  # type: ignore[union-attr]
+            return json.dumps(_DEFAULT_MAP)
+
+    if type_util.is_snowpark_dataframe(data):


I think we actually can just always call convert_anything_to_df here without the is_snowpark_dataframe check. I don't think there is any reason why map should not have the same support for dataframe-like input types as our other elements. In that case, the data = pd.DataFrame(data) in line 200 would need to be removed and the docstring would need to be updated to cover the same types as st.dataframe.

Now we call always convert_anything_to_df 👍

LukasMasuch · 2022-10-25T10:36:24Z

lib/streamlit/type_util.py

@@ -241,9 +242,11 @@ def is_dataframe_like(obj: object) -> TypeGuard[DataFrameLike]:


 def is_snowpark_dataframe(obj: object) -> bool:


nit: maybe it is better to rename this to be a bit more generic - e.g. something like is_snowpark_data_object - since it is not anymore just about the actual snowpark dataframe.

It's renamed 👍

…thing to df in map

sfc-gh-tszerszen · 2022-10-25T11:33:18Z

@LukasMasuch thank you for your review 🙇 I've updated the code accordingly 👍

LukasMasuch

LGTM 👍

LukasMasuch · 2022-10-25T12:59:12Z

lib/streamlit/type_util.py

    """Try to convert different formats to a Pandas Dataframe.

    Parameters
    ----------
    df : ndarray, Iterable, dict, DataFrame, Styler, pa.Table, None, dict, list, or any

+    max_unevaluated_rows: int, If unevaluated data is detected this func will evaluate it,


nit: I think you need to put the description here in a new line.

LukasMasuch · 2022-10-25T13:00:06Z

lib/streamlit/type_util.py

            st.caption(
-                f"⚠️ Showing only 10k rows. Call `collect()` on the dataframe to show more."
+                f"⚠️ Showing only {string_util.simplify_number(max_unevaluated_rows)} rows. "
+                f"Call `collect()` on the dataframe to show more."


nit: I think you can remove the f here in the second line (since there are no variables to replace)

sfc-gh-tszerszen self-assigned this Oct 23, 2022

sfc-gh-tszerszen changed the title ~~Snowpark integration: Add support for snowflake.snowpark.table.Table~~ Snowpark integration Oct 23, 2022

sfc-gh-tszerszen changed the title ~~Snowpark integration~~ Further improvements int Snowpark integration Oct 23, 2022

sfc-gh-tszerszen changed the title ~~Further improvements int Snowpark integration~~ Further improvements of Snowpark integration Oct 23, 2022

sfc-gh-tszerszen force-pushed the add-support-for-snowpark branch 4 times, most recently from 3c17c77 to 5548d42 Compare October 24, 2022 16:38

Snowpark integration: Add support for uncollected dataframes in st.ma…

f226b24

…p and improve st.table data collection

sfc-gh-tszerszen force-pushed the add-support-for-snowpark branch from a546c77 to f226b24 Compare October 24, 2022 17:20

sfc-gh-tszerszen changed the title ~~Further improvements of Snowpark integration~~ Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection Oct 24, 2022

sfc-gh-tszerszen requested review from LukasMasuch and jrieke October 24, 2022 17:26

sfc-gh-tszerszen marked this pull request as ready for review October 24, 2022 17:26

LukasMasuch reviewed Oct 25, 2022

View reviewed changes

version with if and

79fad98

LukasMasuch reviewed Oct 25, 2022

View reviewed changes

sfc-gh-tszerszen added 5 commits October 25, 2022 12:44

remove harahu comments

85f42a6

improve ifs, add string util number formatting and always convert any…

1519b7a

…thing to df in map

rename is_snowpark_dataframe to is_snowpark_data_object

8b1c349

fixup

1dfc332

update st.map spec

7ffc549

LukasMasuch approved these changes Oct 25, 2022

View reviewed changes

introduce nit changes

64d94fd

sfc-gh-tszerszen merged commit 082a7ba into develop Oct 25, 2022

sfc-gh-kmcgrady deleted the add-support-for-snowpark branch October 5, 2023 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590

Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590

sfc-gh-tszerszen commented Oct 23, 2022 •

edited

LukasMasuch Oct 25, 2022 •

edited

sfc-gh-tszerszen Oct 25, 2022

LukasMasuch Oct 25, 2022

LukasMasuch left a comment

LukasMasuch Oct 25, 2022

sfc-gh-tszerszen Oct 25, 2022

LukasMasuch Oct 25, 2022

sfc-gh-tszerszen Oct 25, 2022

LukasMasuch Oct 25, 2022

sfc-gh-tszerszen Oct 25, 2022

LukasMasuch Oct 25, 2022

sfc-gh-tszerszen Oct 25, 2022

LukasMasuch Oct 25, 2022 •

edited

sfc-gh-tszerszen Oct 25, 2022

sfc-gh-tszerszen commented Oct 25, 2022

LukasMasuch left a comment

LukasMasuch Oct 25, 2022

LukasMasuch Oct 25, 2022 •

edited

		@@ -241,9 +242,11 @@ def is_dataframe_like(obj: object) -> TypeGuard[DataFrameLike]:


		def is_snowpark_dataframe(obj: object) -> bool:

Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590

Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590

Conversation

sfc-gh-tszerszen commented Oct 23, 2022 • edited

📚 Context

🧠 Description of Changes

🧪 Testing Done

🌐 References

LukasMasuch Oct 25, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasMasuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasMasuch Oct 25, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-tszerszen commented Oct 25, 2022

LukasMasuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasMasuch Oct 25, 2022 • edited

Choose a reason for hiding this comment

sfc-gh-tszerszen commented Oct 23, 2022 •

edited

LukasMasuch Oct 25, 2022 •

edited

LukasMasuch Oct 25, 2022 •

edited

LukasMasuch Oct 25, 2022 •

edited